Use bit ops instead of integer modulo and divide in shaders #19994

atlv24 · 2025-07-07T00:50:05Z

Objective

Our shader compilation goes through a lot of steps, where any number of things can and do sometimes go wrong: We write our bespoke naga-oil flavor of wgsl, which then gets processed into actual wgsl, which naga then turns into naga-ir, which naga then turns into hlsl/spirv/msl, which the driver then turns into ISA, which the gpu hardware then actually runs. Some layers are lossy and not very good for performance, namely naga->hlsl seems to output some unfortunate polyfills such as:

int naga_mod(int lhs, int rhs) {
    int divisor = ((lhs == int(-2147483647 - 1) & rhs == -1) | (rhs == 0)) ? 1 : rhs;
    return lhs - (lhs / divisor) * divisor;
}

in place of %, even for constant arguments. Some driver toolchains such as FXC then go on to complain about and seemingly not realize they can optimize this.

This potentially actually ends up mattering for CI times, but we'll see.

Solution

Make the lives of the tools easier by sacrificing some readability. The bit ops are what we intend to happen in these cases anyways, not a full blown modulo, so in a way this is just being explicit about it.

Alternate solution considered

Make a naga ir process step which replaces modulo by constant power of two by the equivalent bit op, and do similar for divs and muls. This is more complicated and less explicit though

Testing

deferred_rendering (on all 3 modes), transmission, morph_targets, ssao, volumetric_fog

alice-i-cecile

Not a rendering expert, but I would appreciate comments above the bitops that show what the equivalent non-bitops are.

JMS55 · 2025-07-07T03:07:58Z

crates/bevy_pbr/src/meshlet/meshlet_mesh_material.wgsl

    let material_id = vertex_input / 3u;
+    let vertex_index = vertex_input - material_id * 3u;


This might be worse. I thought compilers can see consecutive % and / and combine the instructions into one thing(?)

Not when its split into a function call that looks like

int naga_mod(int lhs, int rhs) { int divisor = ((lhs == int(-2147483647 - 1) & rhs == -1) | (rhs == 0)) ? 1 : rhs; return lhs - (lhs / divisor) * divisor; } // ... let vertex_index = naga_mod(vertex_input, 3u); let material_id = vertex_input / 3u;

This might be worse. I thought compilers can see consecutive % and / and combine the instructions into one thing(?)

If you play around on Godbolt you'll notice that it actually emits this QUOTIENT <- DIVIDEND / DIVISOR; REMAINDER <- DIVIDEND - QUOTIENT * DIVISOR idiom quite often on higher optimizations, and less so on lower optimizations.

Not all division hardware (Goldschmidt comes to mind) computes both results at once (at least not as a ready-to-use integer remainder), and in general I trust shader compilers to do these optimizations on code way less than I do for regular compilers.

superdump · 2025-07-07T06:33:38Z

crates/bevy_core_pipeline/src/experimental/mip_generation/downsample_depth.wgsl

-    let sub_xy = remap_for_wave_reduction(local_invocation_index % 64u);
-    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) % 2u);
+    let sub_xy = remap_for_wave_reduction(local_invocation_index & 63u);
+    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) & 1u);
    let y = sub_xy.y + 8u * (local_invocation_index >> 7u);


8u is << 3u. Or does naga deal with that properly?

superdump · 2025-07-07T06:34:01Z

crates/bevy_core_pipeline/src/experimental/mip_generation/downsample_depth.wgsl

-    let sub_xy = remap_for_wave_reduction(local_invocation_index % 64u);
-    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) % 2u);
+    let sub_xy = remap_for_wave_reduction(local_invocation_index & 63u);
+    let x = sub_xy.x + 8u * ((local_invocation_index >> 6u) & 1u);


8u is << 3u

superdump · 2025-07-07T06:41:07Z

crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl

@@ -103,6 +103,13 @@ fn henyey_greenstein(neg_LdotV: f32) -> f32 {
    return FRAC_4_PI * (1.0 - g * g) / (denom * sqrt(denom));
 }

+fn simple_wrap_3(index: i32) -> i32 {


Can index be >=6? If so then this is wrong, if not then I think this function should be named/commented to indicate it is special purpose.

its only used in this file, and its only called with numbers in range 0-5. I called it simple_wrap_3, not implying its modulo, because it is not an implementation of modulo, just something that works for this specific case

Could you add a comment saying that it only works for values 0 to 5, which is fine for its use in this file?

My only nit is that this function signature should be unsigned as it's indexing into an array.

Comment could be something simple like // Helper for plane indices, only valid for 0-5

superdump

Mostly looks fine to me.

I noted early in my review that multiplication by powers of two is not changed to left shifts. Does naga handle this or is there no value to it?

Aside from that, just one comment about the simple_wrap_3 function to address.

atlv24 · 2025-07-07T06:56:05Z

Integer multiplication has a dedicated instruction on gpus, there's no need to replace it, bit shifts win us nothing in that case. Integer division and modulos are the real expensive ones, they are on the order of 100 cycles, whereas integer multiplies are usually a cycle or two.

jordanhalase · 2025-07-09T17:26:03Z

Not a rendering expert, but I would appreciate comments above the bitops that show what the equivalent non-bitops are.

I personally don't agree. I would agree if there were large bitop expressions involved that do more complex things but this PR only uses single expressions like x >> n, x & 1, and x & (power_of_2 -1) for "divide by 2^n ", "split even and odd", and "keep in range 0..2^n".

Personally I think commenting these single expressions would just add noise. (But I may also be biased as I'm in firmware and bitops are everywhere.)

I noted early in my review that multiplication by powers of two is not changed to left shifts. Does naga handle this or is there no value to it?

GPU compilers can do fused multiply-add if there is an addition in the expression, which in this PR there is. Left-shifting may actually be less performant here. ~~If anything we should be using fma() here if naga supports it.~~ (fma() appears float only, but GPU may do integer fma as an optimization anyway? Either way I think *8 is better than <<3 here)

https://forums.developer.nvidia.com/t/why-shift-is-slower-than-integer-multiply-shift-integer-multiply/17395

Aside from the helper function everything LGTM.

alice-i-cecile · 2025-07-09T17:44:55Z

I would agree if there were large bitop expressions involved that do more complex things but this PR only uses single expressions like x >> n, x & 1, and x & (power_of_2 -1) for "divide by 2^n ", "split even and odd", and "keep in range 0..2^n".

For reference, I had no idea reading these simple expressions what their equivalent non bit-ops operations were :) Definitely a skill issue (I have no use for bit operations in my ordinary work, and have a non-conventional background), but we already struggle getting new contributors for rendering code; I'm reluctant to make it more arcane.

atlv24 · 2025-07-09T17:49:49Z

Im not bothering to document this until i can actually prove its worthwhile on at least one platform, would be wasted work otherwise. It didnt help CI runners

Use bit ops instead of integer modulo and divide in shaders

e29d0d5

alice-i-cecile reviewed Jul 7, 2025

View reviewed changes

alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jul 7, 2025

github-project-automation bot added this to Rendering Jul 7, 2025

alice-i-cecile added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help X-Contentious There are nontrivial implications that should be thought through S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 7, 2025

JMS55 reviewed Jul 7, 2025

View reviewed changes

Merge branch 'main' into ad/bit-op-shaders

72c9a37

github-actions bot mentioned this pull request Jul 7, 2025

19994 bevyengine/bevy-example-runner#160

Closed

superdump reviewed Jul 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use bit ops instead of integer modulo and divide in shaders #19994

Use bit ops instead of integer modulo and divide in shaders #19994

atlv24 commented Jul 7, 2025 •

edited

Loading

Uh oh!

alice-i-cecile left a comment

Uh oh!

JMS55 Jul 7, 2025

Uh oh!

atlv24 Jul 7, 2025 •

edited

Loading

Uh oh!

jordanhalase Jul 9, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

atlv24 Jul 7, 2025

Uh oh!

superdump Jul 7, 2025

Uh oh!

jordanhalase Jul 9, 2025

Uh oh!

superdump left a comment

Uh oh!

atlv24 commented Jul 7, 2025

Uh oh!

jordanhalase commented Jul 9, 2025 •

edited

Loading

Uh oh!

alice-i-cecile commented Jul 9, 2025 •

edited

Loading

Uh oh!

atlv24 commented Jul 9, 2025

Uh oh!

Uh oh!

		let material_id = vertex_input / 3u;
		let vertex_index = vertex_input - material_id * 3u;

Uh oh!

Use bit ops instead of integer modulo and divide in shaders #19994

Are you sure you want to change the base?

Use bit ops instead of integer modulo and divide in shaders #19994

Conversation

atlv24 commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Alternate solution considered

Testing

Uh oh!

alice-i-cecile left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atlv24 Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

superdump left a comment

Choose a reason for hiding this comment

Uh oh!

atlv24 commented Jul 7, 2025

Uh oh!

jordanhalase commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alice-i-cecile commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atlv24 commented Jul 9, 2025

Uh oh!

Uh oh!

atlv24 commented Jul 7, 2025 •

edited

Loading

atlv24 Jul 7, 2025 •

edited

Loading

jordanhalase commented Jul 9, 2025 •

edited

Loading

alice-i-cecile commented Jul 9, 2025 •

edited

Loading